Abstract

On April 14th, 1912, the Titanic struck an iceberg during its maiden voyage from Southampton to New York City. Two hours and 40 minutes later, approximately 62% of the passengers perished. Prior research has attempted to determine the characteristics of those who survived the sinking compared to those who died in order to assess which attributes may have been prioritized when making life and death decisions that night. The purpose of this study is to further explore the most popular characteristics–class, gender, and age–using descriptive statistics, data visualization, and predictive models (e.g., logistic regression and conditional inference classification trees). Logistic regression results indicate that all three demographic attributes are significant predictors of survival. Furthermore, classification tree results indicate that gender had the largest effect on survival, followed subsequently by class. Interestingly, these results suggest that age was only a significant differentiating factor of survival between males.

Introduction

The Titanic was a British cruise liner that featured the most advanced technology available in 1912. Unfortunately, during its maiden voyage, the Titanic collided with an iceberg just before midnight and sank in the 2-degree Celsius North Atlantic Ocean, resulting in over two-thirds of the passengers and crew dying (Balakumar et al., 2019; Frey et al., 2011; Hall, 1986). Given that many believed the Titanic to be unsinkable due to its size and amenities, inquiries were launched to determine what factors may have contributed to such a large loss of life.

Perhaps the greatest driver of the death toll was the lack of preparedness. There were not enough lifeboats on board to save all of the passengers and crew. The ship included twenty lifeboats, which was only enough for 52% of the passengers (Frey et al., 2011; Hall, 1986; Symanzik et al., 2019). Additionally, a portion of lifeboats that were launched that night were not full (Frey et al., 2011; Symanzik et al., 2019). Those who did not get a seat in a lifeboat would perish due to the freezing waters and the lower probability of being saved as it’s reported that partially full lifeboats that were lowered made no attempt to save people from the water (Hall, 1986; Frey et al., 2011).

The Titanic took approximately two hours and forty minutes to sink to the bottom of the ocean, which is a lengthier amount of time compared to other maritime disasters. For example, the Lusitania took only 18 minutes to sink to the bottom of the ocean after being struck by a torpedo (Frey et al., 2011). It has been hypothesized that this longer amount of time left room for social patterns to operate rather than more selfish interests where passengers may have felt more of a fight-or-flight response to more imminent danger (Frey et al., 2011). For example, evacuating women and children before men was a social norm and code of conduct in 1912 (Farag & Hassan, 2018). It has also been documented that Captain Edward Smith had shouted, “Women and children first” after the Titanic collided with the iceberg (Farag & Hassan, 2018). Furthermore, this length of time may have also allowed for patterns related to passengers’ wealth to emerge. As shown in the graph below, there was a broad range in the ticket prices to the Titanic. When taking inflation rates into consideration, the price for a first-class cabin today would cost $4,241.74, whereas second and third-class tickets would cost $1,696.70 and $1,131.13, respectively. Even within the first-class passengers, there was a large range of wealth as the most expensive first-class accommodations would cost $123,010.82 today.1

ggplotly(fare_graph)

Social status may have played a role that night as the crew would have been more likely to accommodate the wealthier passengers and less likely to accommodate the passengers of lower means. First-class passengers may also have used their wealth to bargain with crew members (Frey et al., 2011). Furthermore, the ship was laid out in a manner that gave the first-class passengers an advantage. Frey et al. (2011) explained that lifeboats were stored closest to the first-class cabins, which also allowed them to have greater access to information about the disaster. They were also more likely to have a relationship with the officers who gave orders for loading lifeboats, which may have given them an advantage in survival. Based on these accounts, it is worth exploring what passenger-level characteristics may have been associated with higher rates of survival.

Research Questions

  1. What were the characteristics of the passengers of the Titanic who survived or perished?
  2. Were passengers’ class, gender, and age significant predictors of survival?
  3. Which of the three demographic characteristics had the greatest influence on survival?

Methods

Analytic Sample

The data utilized for this study is from Encyclopedia Titanica (2021). The data was collected from primary sources, including the ship’s manifesto and records. Since the population of interest for this study is passengers who were aboard the Titanic during its sinking, passengers who disembarked at Cherbourg, Queenstown, and Southampton (n = 35) as well as crew members (n = 1,123) were excluded from the analyses. Missing data was handled through listwise deletion of two participants who did not have their ages recorded. Thus, the analytic sample consisted of 1,315 passengers. Over half (53.75%) of the Titanic’s passengers were in third-class accommodations, whereas 24.64% and 21.60% were in first and second-class, respectively.

dat %>% 
   group_by(class) %>% 
   summarize(count = n()) %>% 
   mutate(percent = (count/sum(count))*100) %>% 
   adorn_totals() %>%
   kable(caption = "Breakdown of Passengers by Class",
       col.names = c("Class", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
  kable_classic(full_width = F, html_font = "Cambria")
Breakdown of Passengers by Class
Class Count Percent
1st Class 324 24.64
2nd Class 284 21.60
3rd Class 707 53.76
Total 1315 100.00

Across all passengers, the ages ranged from 0-74 years (M = 31.42, SD = 13.92). The table below shows the distribution of age by each class. The average age in first-class was substantially older than both second and third-class. This may suggest that the trip served a different purpose for that group of passengers, such as recreation and experience versus business travels and immigration (Hall, 1986).

dat %>% 
   group_by(class) %>% 
   summarize(avg_age = mean(age), std_age = sd(age), min_age = min(age), 
             max_age = max(age)) %>%
   kable(caption = "Average Age by Class",
       col.names = c("Class", "Average Age", "SD Age", "Min. Age", "Max. Age"),
       digits = 2,
       booktabs = TRUE) %>%
  kable_classic(full_width = F, html_font = "Cambria")
Average Age by Class
Class Average Age SD Age Min. Age Max. Age
1st Class 39.14 13.55 0 71
2nd Class 30.01 13.90 0 71
3rd Class 25.12 11.71 0 74

The table below shows the list of nationalities reported by the Titanic’s passengers. The majority of the passengers where English (22.43%), American (18.40%), and Irish (9.28%). The majority of first-class passengers were American (60.19%), whereas the majority of second-class passengers were English (51.06%). Third-class passengers were the most diverse class, with the most popular nationalities being English (15.84%), Irish (14.85%), Swedish (12.73%), and Syrian/Lebanese (11.74%). The difference in nationalities were likely due to the large number of individuals in third-class who were immigrating to American (Hall, 1986).

dat %>% 
   filter(!is.na(nationality2)) %>% 
   group_by(nationality2) %>% 
   summarize(count = n()) %>% 
   mutate(percent = (count/sum(count))*100) %>% 
   arrange(desc(percent)) %>%
   kable(caption = "Breakdown of Passenger Nationalities",
       col.names = c("Nationality", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
   kable_styling(fixed_thead = T, full_width = F, html_font = "Cambria", bootstrap_options = c("striped", "hover"))
Breakdown of Passenger Nationalities
Nationality Count Percent
English 295 22.43
American 242 18.40
Irish 122 9.28
Other - Multiple 108 8.21
Swedish 100 7.60
Syrian/Lebanese 85 6.46
Finnish 58 4.41
Canadian 37 2.81
Bulgarian 31 2.36
Croatian 28 2.13
French 26 1.98
Norwegian 26 1.98
Belgian 25 1.90
Scottish 17 1.29
Channel Islander 15 1.14
Swiss 13 0.99
Danish 10 0.76
Italian 9 0.68
German 8 0.61
Spanish 8 0.61
Welsh 8 0.61
Polish 6 0.46
Bosnian 4 0.30
Hong Kongese 4 0.30
South African 4 0.30
Greek 3 0.23
Lithuanian 3 0.23
Uruguayan 3 0.23
Australian 2 0.15
Chinese 2 0.15
Portuguese 2 0.15
Slovenian 2 0.15
Austrian 1 0.08
Dutch 1 0.08
Egyptian 1 0.08
Haitian 1 0.08
Hungarian 1 0.08
Japanese 1 0.08
Latvian 1 0.08
Mexican 1 0.08
Turkish 1 0.08

Measures

Dependent Variable

The primary outcome of interest was survival status, which was recorded as a dichotomous factor variable (lost or survived).

Independent Variables

Independent variables included class (which serves as a proxy for socioeconomic status), binary gender, and age. Class was recorded as a three-level factor variable (first-class, second-class, and third-class), whereas gender was recorded as a dichotomous factor variable (female or male). Age (in years) was recorded as a continuous variable.

Analysis

Data analysis was performed using RStudio: Integrated Development Environment for R (RStudio Team, 2021) version 4.1.1. Descriptive statistics were computed to describe the analytic sample as well as compare survival rates across demographic subgroups of interest. Density ridges were graphed in order to visualize survival rate differences for gender and class subgroups across age ranges. Next, a logistic regression model was estimated to examine whether the main effects of gender (reference group = female), class (reference group = first-class), and age were significant predictors of surviving the disaster. To assess how these groups interact to influence survival as well which variable was the most influential, a conditional classification tree was estimated. Conditional classification trees combine recursive partitioning and statistical inference. This type of classification tree uses a splitting criteria based on Bonferroni-corrected statistical significance testing, which minimizes biases often associated with traditional classification trees (Hothorn et al., 2006). Alpha was set at .95 for all multivariate analyses.

Results

Descriptive Statistics of Survival

Within the analytic sample, 61.98% of passengers died during the sinking, whereas 38.02% survived.

dat %>% 
   group_by(survived) %>% 
   summarize(count = n()) %>% 
   mutate(percent = (count/sum(count))*100) %>% 
   adorn_totals() %>%
   kable(caption = "Overall Survival Outcomes",
       col.names = c("Outcomes", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
 kable_classic(full_width = F, html_font = "Cambria")
Overall Survival Outcomes
Outcomes Count Percent
Lost 815 61.98
Saved 500 38.02
Total 1315 100.00

When examining the descriptive statistics broken down by class and gender, there are substantial disparities in survival. As shown in the table below, 62.04% of first-class passengers survived, compared to 41.55% of second-class passengers and 74.47% of third-class passengers.

dat %>% 
   group_by(class, survived) %>% 
   summarize(count = n()) %>% 
   mutate(percent = (count/sum(count))*100) %>% 
   arrange(class, survived) %>%
   kable(caption = "Survival Rate by Class",
       col.names = c("Class", "Survived", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
 kable_classic(full_width = F, html_font = "Cambria")
Survival Rate by Class
Class Survived Count Percent
1st Class Lost 123 37.96
1st Class Saved 201 62.04
2nd Class Lost 166 58.45
2nd Class Saved 118 41.55
3rd Class Lost 526 74.40
3rd Class Saved 181 25.60

As shown in the table below, 72.75% of female passengers survived compared to 18.96% of male passengers.

dat %>% 
 group_by(gender, survived) %>% 
 summarize(count = n()) %>% 
 mutate(percent = (count/sum(count))*100) %>% 
 arrange(gender, survived) %>%
 kable(caption = "Survival Rate by Gender",
       col.names = c("Gender", "Survived", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
  kable_classic(full_width = F, html_font = "Cambria")
Survival Rate by Gender
Gender Survived Count Percent
Female Lost 127 27.25
Female Saved 339 72.75
Male Lost 688 81.04
Male Saved 161 18.96

The table below shows survival rates broken down by both class and gender. Only five female first-class female passengers lost their lives while 96.53% survived. Approximately 65% of first-class male passengers lost their lives while 34.44% survived. Among second-class female passengers, 11.32% perished and 88.68% survived. For second-class male passengers, 86.52% perished and 13.48% survived. 50.93% of third-class female passengers lost their lives while 49.07% survived. Nearly 85% of third-class male passengers lost their lives while 15.21% survived. These differences in rates highlight how class and gender may interact to predict survival.

dat %>% 
   group_by(class, gender, survived) %>% 
   summarize(count = n()) %>% 
   mutate(percent = (count/sum(count))*100) %>% 
   arrange(class, gender) %>%
   kable(caption = "Survival Rate by Class and Gender",
       col.names = c("Class", "Gender", "Survived", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
 kable_classic(full_width = F, html_font = "Cambria")
Survival Rate by Class and Gender
Class Gender Survived Count Percent
1st Class Female Lost 5 3.47
1st Class Female Saved 139 96.53
1st Class Male Lost 118 65.56
1st Class Male Saved 62 34.44
2nd Class Female Lost 12 11.32
2nd Class Female Saved 94 88.68
2nd Class Male Lost 154 86.52
2nd Class Male Saved 24 13.48
3rd Class Female Lost 110 50.93
3rd Class Female Saved 106 49.07
3rd Class Male Lost 416 84.73
3rd Class Male Saved 75 15.27

Furthermore, age was an important factor that contributed to survival. As shown in the figure below, first-class passengers had the largest age distribution among those who survived, regardless of gender. Males between the ages of 18-30 years old had the highest survival rates, whereas men between the ages of 5 and 18 had the worst survival rates. Women between the ages of 14 and 40 had the highest survival rates. Interestingly, age seems to have had the greatest impact on survival within second-class males as the survival group’s ridge peaks earlier compared to all of the other class and gender combinations.

surv_ageclass_hist

Logistic Regression Model

Results of the main effects logistic regression model predicting survival are shown in the table below. When controlling for the effects of gender and class, age was a significant predictor of survival (OR = .97, 95% CI [0.95, 0.98], p < .001). With each additional year in age, passengers’ odds of survival decreased by three percent. When controlling for the effects of age and gender, class affiliation was a significant predictor of survival. Compared to first-class passengers, second-class passengers’ (OR = .27, 95% CI [0.18, 0.40], p < .001) and third-class passengers’ (OR = .10, 95% CI [0.07, 0.15], p < .001) odds of surviving the disaster were 73% lower and 90% lower, respectively. Gender was also a significant predictor of survival (OR = .08, 95% CI [0.06, 0.11], p < .001), even when controlling for class and age. Male passengers faced 92% lower odds of survival compared to female passengers. Taken together, these results confirm that–even when controlling for one another–class, age, and gender significantly affected survival rates.

tbl_m1
Characteristic OR1 95% CI1 p-value
age 0.97 0.95, 0.98 <0.001
gender
Female
Male 0.08 0.06, 0.11 <0.001
class
1st Class
2nd Class 0.27 0.18, 0.40 <0.001
3rd Class 0.10 0.07, 0.15 <0.001

1 OR = Odds Ratio, CI = Confidence Interval

Classification Tree

The figure below shows the results of the conditional classification tree used to model survival. The tree’s terminal nodes identified the following eight subgroups:

  1. First-class females
  2. Second-class females
  3. Third-class females
  4. First-class males, 54 years of age or younger
  5. first-class males, older than 54 years of age
  6. Second-class males, nine years of age or younger
  7. Third-class males, nine years of age or younger
  8. Second and third-class males, older than nine years of age

The terminal nodes’ barplots indicate the breakdown of survival for each subgroup (black = survival, gray = loss of life). Each gender was stratified by class, suggesting that class was an important predictor of survival for both males and females. However, class had much smaller effect in women (p = .044) than men (p <.001). Female subgroups were not split by age, whereas all male subgroups were split by age following class, which indicates that age had a larger effect among males than females. Furthermore, the age split for first-class males (54 years of age) is substantially larger than the age split among second and third-class males (nine years of age), which aligns with the wider age distribution of first-class males previously observed in the density ridge graphs. Interestingly, second and third-class males over the age of nine were not split by class. When examining the model as a whole, the base node was gender (p < .001), suggesting it was the greatest predictor of survival entered into the model. Thus, based on these order of the tree splits, one can hypothesize that gender was the largest predictor of survival, followed by class and age, respectively.

plot(ctree, main = "Predicting Survival From Gender, Class, and Age")

Discussion

(Shawn McWeeney)

References

Note: Shawn McWeeney was tasked with inputting the APA citations into R.

For this project, we used several packages. These included Chan et al. (2021), Wickham and Hester (2021), Müller (2020), Wickham et al. (2019), Firke (2021), Sievert (2020), Zhu (2021), Wilke (2021), Hothorn, Hornik, and Zeileis (2006), and Sjoberg et al. (2021). The statistical computing software we used is R Core Team (2021).

(Shawn McWeeney was tasked with inputting the APA citations into R)

Dieckmann, C. (2020). The Mystery of the Titanic: What Really Happened. URJ-UCCS: Undergraduate Research Journal at UCCS, 13(1), Article 1. https://urj.uccs.edu/index.php/urj/article/view/491).

Encyclopedia Titanica. (2021). Titanic people explorer. https://www.encyclopedia-titanica.org/explorer/

Frey, B. S., Savage, D. A., & Torgler, B. (2010). Interaction of natural survival instincts and internalized social norms exploring the Titanic and Lusitania disasters. Proceedings of the National Academy of Sciences of the United States of America, 107(11), 4862–4865. https://doi.org/10.1073/pnas.0911303107.

Frey, B. S., Savage, D. A., & Torgler, B. (2011). Behavior under Extreme Conditions: The Titanic Disaster. Journal of Economic Perspectives, 25(1), 209–222. https://doi.org/10.1257/jep.25.1.209).

Frey, B. S., Savage, D. A., & Torgler, B. (2011). Who perished on the Titanic? The importance of social norms. Rationality and Society, 23(1), 35–49. https://doi.org/10.1177/1043463110396059.

Farag, N., & Hassan, G. (2018). Predicting the Survivors of the Titanic Kaggle, Machine Learning From Disaster. Proceedings of the 7th International Conference on Software and Information Engineering, 32–37. https://doi.org/10.1145/3220267.3220282.

Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651-674.

Hall, W. (1986). Social class and survival on the SS Titanic. Social Science & Medicine, 22(6), 687-690. https://doi.org/10.1016/0277-9536(86)90041-9.

Lassieur, A. (2012). Can You Survive the Titanic? An interactive survival adventure. Horn Book Magazine, 88(2), 139–139.

Pipe, J. (2011). Titanic, a very peculiar history (Vol. 5). Andrews UK Limited.

Sahr, R. (2021). Inflation Conversion Factors for years 1774 to estimated 2028. https://liberalarts.oregonstate.edu/spp/polisci/research/inflation-conversion-factors-convert-dollars-1774-estimated-2024-dollars-recent-year

Takis, S. L. (1999). Titanic: A statistical exploration. The Mathematics Teacher, 92(8), 660-664. U.S. Bureau of Labor Statistics. (2021). Consumer price index. https://www.bls.gov/cpi/

Chan, Chung-hong, Geoffrey CH Chan, Thomas J. Leeper, and Jason Becker. 2021. Rio: A Swiss-Army Knife for Data File I/O.

Firke, Sam. 2021. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.

Hothorn, Torsten, Kurt Hornik, and Achim Zeileis. 2006. “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics 15 (3): 651–74.

Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.

R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Sievert, Carson. 2020. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.

Sjoberg, Daniel D., Karissa Whiting, Michael Curry, Jessica A. Lavery, and Joseph Larmarange. 2021. “Reproducible Summary Tables with the Gtsummary Package.” The R Journal 13 (1): 570–80. https://doi.org/10.32614/RJ-2021-053.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, and Jim Hester. 2021. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

Wilke, Claus O. 2021. Ggridges: Ridgeline Plots in ’Ggplot2’. https://CRAN.R-project.org/package=ggridges.

Zhu, Hao. 2021. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.


  1. This graph was made based on the average fare price (in $USD) for each type of accommodation reported by Pipe (2011). We calculated inflation using consumer price indexes (CPIs) from the U.S. Bureau of Labor Statistics (2021) and Sahr (2021) for each respective year with the following formula: \[Adjusted Price = (Inflation Year CPI / 1912 CPI) * Original Price \]↩︎